A CNN Solution for Malaria Detection¶

Report¶

Executive Summary¶

Over 229 million cases and 409,000 fatalities were expected to be attributed to malaria in 2020, even though it is both preventable and treatable.

Analyzing blood films with a microscope, the standard approach for detecting malaria, is labor-intensive, time consuming and requires expert knowledge. Data exploration revealed a distinctive feature found in infected cells - a purpe discoloration. This project proposes a Convolutional Neural Network (CNN) model for malaria diagnosis employing microscopic cell images. Six different iterations of the model were created and tested, with a sigmoid activation model proving to be the most successful due to its high accuracy (98.3%) and low number of false negatives.

Computational infrastructure, data collecting and labeling for model improvement, integration with existing healthcare systems, and user training are all costly endeavors that must be undertaken before widespread adoption can occur. The application saves time and money compared to traditional methods of diagnosing malaria using blood films, and it improves diagnostic accuracy. The possible risks are a lack of privacy and security, noncompliance with regulations, poor data quality and insufficient infrastructure.

Future research should focus on refining the model through transfer learning, expanding disease detection capabilities with multiclass classification models, and keeping a close eye on the model's performance.

Problem Summary¶

Malaria is a potentially fatal illness that is caused by the bites of female Anopheles mosquitoes carrying harmful parasites. Despite being preventable and treatable, it continues to be a serious worldwide health issue. In 2020, the World Health Organization predicts that over half of humanity will be at danger of contracting malaria, which will cause an estimated 229 million cases and 409,000 fatalities worldwide. Particularly in Sub-Saharan Africa, where over two-thirds of malaria deaths each year are caused by children under five, the disease mostly affects the most vulnerable and impoverished people.

To lessen the severity of the illness and avoid fatalities, malaria treatment and early detection are essential. However, access to diagnostic procedures and medical care is constrained in many areas where malaria is widespread. In order to avoid overusing antimalarial medications, which can result in drug resistance, accurate and prompt identification of malaria is also essential. Examining blood films under a microscope is the classic approach for diagnosing malaria. Accurately identifying and counting malaria parasites demands a great deal of knowledge. Additionally, it may take a while, which could cause a delay in both diagnosis and therapy.

Therefore, the development of effective, reliable, and easily usable malaria detection techniques is urgently needed. A model for malaria diagnosis using microscopic pictures could be a game-changing answer by utilizing developments in machine learning and image processing. In addition to lightening the load on medical staff, it would speed up diagnosis and result in timely and effective treatment. As a result, there is a chance that this will drastically lower malaria morbidity and mortality.

Such a solution might offer significant socioeconomic advantages in addition to the immediate health effects. In areas where malaria is a serious problem, it might increase productivity and economic growth as healthier communities are better able to support their local economy. A successful machine learning model may also be used as a model for combating other infectious illnesses, opening up fresh possibilities in the struggle against dangers to world health.

Solution Summary¶

The proposed solution employs a Convolutional Neural Network (CNN) model for image classification, which is well suited for this issue due to its demonstrated performance in image recognition tasks and its capacity to preserve the spatial relationships between pixels, an important consideration when identifying distinctive features within microscopic images of cells.

In total, 6 CNN models were designed with varying specifications. As seen in Figure 1, Image inputs were also manipulated to verify if they would have any impact on the accuracy of the models by making the distinctive features of each type of cell more detectable. From the image dataset, it was clear that the infected cells had a distinctive purple discoloration that needed to be understood by the model.

FIgure 1 - Cell images.png

The metrics of success used to determine the best model were a combination of accuracy ranking and an analysis of the types of classification mistakes made by the models (minimizing false negatives). In this problem statement, it is important that the models are accurate when classifying cells as infected and non-infected. It is also highly important that the model minimizes the instances where it incorrectly diagnoses an infected cell as healthy (referred to as a false negative) to avoid scenarios where infected patients are left undiagnosed and untreated.

Figure 2 shows the performance of the various models compared to each other. The highlighted model is the proposed solution which boasts an impressive accuracy of 98.3% while also having only 12 instances of false negatives from a test set of 2600 images. Despite model2 having an accuracy that matches model1’s, there are more instances of model2 classifying an infected cell as healthy and hence model1 is the preferable solution for this problem statement. Figure 2 also highlights that there was a negative impact on accuracy when the images were manipulated (model3 and model_hsv).

Figure 2 - Comparison of models.png

Model1 uses the original images and treats the problem as a binary classification problem, meaning that the output can only be one of two things (hence the sigmoid activation in the output layer).

This ML model can greatly contribute to the diagnostic process, allowing for timely and proper treatment of patients, and possibly saving lives, by providing a robust, accurate, and rapid solution for malaria detection. The disease can also help reduce the stress on medical staff in areas where malaria is prevalent. This model's portability across a wide range of computing platforms and speed in handling massive volumes of tests means it can be used to offer accurate diagnoses to previously unreached places.

Recommendations for implementation¶

Deep learning models require significant computational resources. Implementing this model will require setting up an infrastructure that makes these resources available in a cost-effective and scalable way so that it can have widespread use. Furthermore, integration into existing healthcare information systems will require the assistance of healthcare IT experts but would ultimately assist in seamless adoption and use by healthcare professionals. Future-proofing this model will require maintenance as new image data over time can impact the accuracy of detection. Drug-resistant parasites may evolve over time to infect healthy cells in new manners and therefore may produce other distinctive features that need to be taught to the model. User training on how to input data in a standardized format will also assist in the maintenance and use of the model. In summary, stakeholders are required to invest in the necessary computational infrastructure, continue to collect and label images for model improvement, work with healthcare IT experts for integration into existing systems and train professionals on how to use the system.

Hosting this model on AWS would cost around 10,000 USD annually per 100,000,000 predictions (0.10 USD per 1000 predictions according to Amazon ML quotes). The cost of integration into existing systems would be the labor cost of software engineers or a consulting firm and would be dependent on the complexity of the existing systems. User training could be conducted using instructional videos with online distribution which could be made the responsibility of the integration team as well. Maintaining and improving the model would require a data scientist or a machine learning engineer and so the cost would the yearly labor cost of one professional (~ $120,000).

The benefit of implementing this model would be a faster, more accurate, and more cost-effective diagnosis of malaria. The labor cost and time required for lab technicians to conduct manual tests of blood films would be greatly reduced with an increase in the accuracy of correct diagnosis.

Risk Analysis¶

Ensuring patient health data privacy is a critical risk to this solution. Mishandling of patient data due to insecure protocols can have legal repercussions as well as personal repercussions for those patients involved. Robust security measures must be upheld to maintain the integrity of this data.

Data quality must also be monitored as the performance of the model is highly dependent on the data that it is trained on. Inaccurate data can negatively impact the model’s performance over time leading to misdiagnosis. User training and additional monitoring is required to ensure the data is of good quality.

For low-resource locations where computing resources or internet connectivity is limited, the implementation of this model may be more technically challenging. Local solutions would have to be available for adoption in such settings and may cost more than the cloud alternative.

Future Considetations¶

The most immediate consideration is to attempt to improve the model through transfer learning. There are existing models that have been trained on many more cell images than the proposed solution. These pre-trained models can be used to reduce the training time and data requirements for improved accuracy. Furthermore, the model can be extended to identify more than one type of disease. Multiclass classification models could be developed to assist with more types of diagnosis and would also help with the adoption of the model as its value to healthcare professionals increases from one domain to many domains. Lastly, performance monitoring is key to future proofing this model. Analysis of its performance over time can prevent accuracy drops or other functional issues.

References¶

Amazon (2023) Getting started, Amazon. Available at: https://aws.amazon.com/getting-started/projects/build-machine-learning-model/services-costs/#:~:text=The%20monthly%20price%20for%20Amazon,%2F1000)%20*%20890%2C000).

World Health Organization, 2022. World malaria report 2022. World Health Organization.

Code¶

Problem Definition¶

The context: Why is this problem important to solve?
Malaria is a life-threatening disease caused by parasites that are transmitted to people through the bites of infected female mosquitoes.More accurate detection of infected cells can ensure that patients who are suffering from malaria receive the medical attention they require. In scenarios where malaria is in its early stages, it may be harder for the human eye to tell from glance whether a cell is infected or not. ML models may be able to have a higher predictive accuracy and therefore ensure that patients are not misdiagnosed. Developing a model for quick and efficient detection of malaria parasites in cells can be a significant step towards combating this disease.

The objectives: What is the intended goal?

Develop a ML model that has a high predictive accuracy of whether a cell image is of an infected or uninfected cell with as few false negatives as possible. This could potentially assist healthcare providers in diagnosing the disease, reducing the time and increasing the accuracy of diagnosis, and helping to provide appropriate treatment more quickly.

The key questions: What are the key questions that need to be answered?

What are the distinctive features separating the cells? What kind of classification is required? What kind of architecture can best predict infected vs uninfected cells accurately? What kind of preprocessing is required to improve the accuracy of the models? Can the developed model generalize well for unseen data?

The problem formulation: What is it that we are trying to solve using data science?

We are trying to solve binary classification problem where the two classes are "parasitized" and "uninfected". From input of a dataset of images of these two types of cells, we want to train a model to learn from this data set and make accurate predictions on unseen data. We want the model to be as accurate as possible while reducing false positives and especially false negatives.

Data Description ¶

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:

Parasitized: The parasitized cells contain the Plasmodium parasite which causes malaria
Uninfected: The uninfected cells are free of the Plasmodium parasites

Mount the Drive

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Loading libraries¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import zipfile
import os
from PIL import Image


import warnings
warnings.filterwarnings('ignore')

Let us load the data¶

In [ ]:
path = '/content/drive/MyDrive/Capstone Project/cell_images.zip'

# Zip extraction
with zipfile.ZipFile(path, 'r') as zipExtract:
    zipExtract.extractall()

The extracted folder has different folders for train and test data contains the different sizes of images for parasitized and uninfected cells within the respective folder name.

The size of all images must be the same and should be converted to 4D arrays so that they can be used as an input for the convolutional neural network. Also, we need to create the labels for both types of images to be able to train and test the model.

Let's do the same for the training data first and then we will use the same code for the test data as well.

In [ ]:
# Function to load, resize, and label images
def process_images(data_dir, SIZE=64):
    image_data = []
    labels = []

    # Dictionary for label mapping
    label_mapping = {'/parasitized/': 1, '/uninfected/': 0}

    # Iterate through each folder in the directory
    for folder_name in ['/parasitized/', '/uninfected/']:
        # Get list of all files in the folder
        image_files = os.listdir(data_dir + folder_name)

        # Iterate through each image in the folder
        for image_name in image_files:
            try:
                # Load and resize the image
                image = Image.open(data_dir + folder_name + image_name)
                image = image.resize((SIZE, SIZE))

                # Convert the image to array and append to list
                image_data.append(np.array(image))

                # Append the label
                labels.append(label_mapping[folder_name])

            except Exception as e:
                print("Error:" +  {image_name} + " " +  {e})

    return np.array(image_data), np.array(labels)


#Storing the path of the extracted "train" and "test" folders
train_dir = '/content/cell_images/train'
test_dir = '/content/cell_images/test'

#Process the training and testing data
train_images, train_labels = process_images(train_dir)
test_images, test_labels = process_images(test_dir)
In [ ]:
train_images = np.array(train_images)
train_labels = np.array(train_labels)
test_images = np.array(test_images)
test_labels = np.array(test_labels)

Check the shape of train and test images

In [ ]:
print("Shape of train images: ", train_images.shape)
print("Shape of test images: ", test_images.shape)
Shape of train images:  (24958, 64, 64, 3)
Shape of test images:  (2600, 64, 64, 3)

Check the shape of train and test labels

In [ ]:
print("Shape of train labels: ", train_labels.shape)
print("Shape of test labels: ", test_labels.shape)
Shape of train labels:  (24958,)
Shape of test labels:  (2600,)

Observations and insights:

There are 24,958 images in the training dataset and 2600 images in the test dataset.

Images Dimensions - 64x64x3 (height x width x colours(RGB so its 3))

Labels Dimensions - 1D array filled with 1/0 depending on whether the image represents parisitized or uninfected cells.

Check the minimum and maximum range of pixel values for train and test images

In [ ]:
print("Train images:")
print("Min pixel value: ", np.min(train_images))
print("Max pixel value: ", np.max(train_images))

print("Test images:")
print("Min pixel value: ", np.min(test_images))
print("Max pixel value: ", np.max(test_images))
Train images:
Min pixel value:  0
Max pixel value:  255
Test images:
Min pixel value:  0
Max pixel value:  255

Observations and insights:

The images are in a standard format where 0 represents black, 255 represents white, and values in between represent varying shades of colors.

The range is consistent across both train and test datasets, which implies that the train and test sets have similar characteristics.

Count the number of values in both uninfected and parasitized

In [ ]:
#In both label arrays, the label "1" respresents parasitized cells and the label "0" represents uninfected cells, so we can sum instances of each label within the array to count them.
num_parasitized_train = np.sum(train_labels == 1)
num_uninfected_train = np.sum(train_labels == 0)

num_parasitized_test = np.sum(test_labels == 1)
num_uninfected_test = np.sum(test_labels == 0)

print("Training set: " + str(num_parasitized_train) + " parasitized, " + str(num_uninfected_train) + " uninfected")
print("Test set: " + str(num_parasitized_test) + " parasitized, " + str(num_uninfected_test) + " uninfected")
Training set: 12582 parasitized, 12376 uninfected
Test set: 1300 parasitized, 1300 uninfected

Normalize the images

In [ ]:
train_images = train_images / 255.0
test_images = test_images / 255.0

Observations and insights:

All pixel values will now be in the range [0,1]. Reducing the scale of pixel values will help the model to learn and converge faster.

Plot to check if the data is balanced

In [ ]:
# Create a list with our labels
labels = ['Uninfected', 'Parasitized']

# Count the occurrences of each class in the train set
counts_train = [np.sum(train_labels == 0), np.sum(train_labels == 1)]

# Count the occurrences of each class in the test set
counts_test = [np.sum(test_labels == 0), np.sum(test_labels == 1)]

# Create a figure and a set of subplots
fig, ax = plt.subplots()

# Define bar width
bar_width = 0.35

# Create bar plot for training data
rects1 = ax.bar(np.arange(len(labels)), counts_train, bar_width, label='Train')

# Create bar plot for test data
rects2 = ax.bar(np.arange(len(labels)) + bar_width, counts_test, bar_width, label='Test')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_xlabel('Classes')
ax.set_ylabel('Count')
ax.set_title('Counts by class and dataset')
ax.set_xticks(np.arange(len(labels)) + bar_width / 2)
ax.set_xticklabels(labels)
ax.legend()

# Display the plot
plt.show()

Observations and insights:

The size of the bars are approximately equal. This means there is roughly the same number of instances of each class (parasitized and uninfected) in the dataset and indicates that the dataset is balanced.

Data Exploration¶

Let's visualize the images from the train data

In [ ]:
#Displaying 20 random images from the train dataset alongside the labels
plt.figure(figsize=(10,10))
for x in range(20):
    plt.subplot(5,5,x+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    #random integer being chosen so that both parasitized and uninfected cells can be seen
    i = int(np.random.randint(0, train_images.shape[0], 1))
    plt.imshow(train_images[i])
    if train_labels[i]:
        plt.xlabel("Parasitized")
    else:
        plt.xlabel("Uninfected")
plt.show()

Observations and insights:

Parastized cells have a distinct discolouration of pink/purple in their cells compared to colour scheme of their surroundings wheras uninfected cells have relatively uniform colours throughout the cells.

Visualize the images with subplot(6, 6) and figsize = (12, 12)

In [ ]:
# Displaying 36 random images from the train dataset alongside the labels
plt.figure(figsize=(12,12))
for x in range(36):  # Changed the number of iterations to 36
    plt.subplot(6,6,x+1)  # Changed to a 6x6 grid
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    # random integer being chosen so that both parasitized and uninfected cells can be seen
    i = int(np.random.randint(0, train_images.shape[0], 1))
    plt.imshow(train_images[i])
    if train_labels[i]:
        plt.xlabel("Parasitized")
    else:
        plt.xlabel("Uninfected")
plt.show()

Observations and insights:

As above, parastized cells have a distinct discolouration of pink/purple in their cells compared to colour scheme of their surroundings wheras uninfected cells have relatively uniform colours throughout the cells.

Plotting the mean images for parasitized and uninfected

In [ ]:
#Separating the images based on the labels
parasitized_images = train_images[train_labels == 1]
uninfected_images = train_images[train_labels == 0]

#Calculating mean images
mean_parasitized = np.mean(parasitized_images, axis=0)
mean_uninfected = np.mean(uninfected_images, axis=0)

Mean image for parasitized

In [ ]:
plt.figure(figsize=(10,5))

plt.subplot()
plt.imshow(mean_parasitized)
plt.title('Mean Parasitized')

plt.show()

Mean image for uninfected

In [ ]:
plt.figure(figsize=(10,5))

plt.subplot()
plt.imshow(mean_uninfected)
plt.title('Mean Uninfected')

plt.show()

Observations and insights:

The mean image of the parasitized cell is a slightly darker hue of pink than the infected cell. The difference suggests that there are distinctive visual features that might be useful for the ML model to learn to make accurate predictions. It also indicates that the mean image alone could potentially differentiate between parasitized and uninfected cells. The mean image may have good predictive power when used as inputs to an ML model for malaria detection.

Converting RGB to HSV of Images using OpenCV

Converting the train data

In [ ]:
import cv2

# Converting train images to HSV
train_images_hsv = []
for image in train_images:
    # Converting image to float32 datatype
    image = image.astype(np.float32)
    # Converting image from RGB to HSV
    image_hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    train_images_hsv.append(image_hsv)

# Converting the list to a numpy array
train_images_hsv = np.array(train_images_hsv)
In [ ]:
#Visualizing the images after conversion

# Generating 3 random indices
random_indices = np.random.choice(len(train_images_hsv), size=3, replace=False)

# Creating 3 subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))

# Iterating over the subplots and randomly selected indices
for i, ax in zip(random_indices, axs):

    #Finding labels by comparing to train labels
    if train_labels[i] == 1:
        img_label = "Parasitized"
    else:
        img_label = "Uninfected"

    # Display the image in HSV format
    ax.imshow(train_images_hsv[i])
    ax.set_title("Index:" + str(i) + " " + img_label)
    ax.axis("off")

plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Converting the test data

In [ ]:
# Converting test images to HSV
test_images_hsv = []
for image in test_images:
    # Converting image to float32 datatype
    image = image.astype(np.float32)
    # Converting image from RGB to HSV
    image_hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    test_images_hsv.append(image_hsv)

# Converting the list to a numpy array
test_images_hsv = np.array(test_images_hsv)
In [ ]:
#Visualizing the images after conversion

# Generating 3 random indices
random_indices = np.random.choice(len(test_images_hsv), size=3, replace=False)

# Creating 3 subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))

# Iterating over the subplots and randomly selected indices
for i, ax in zip(random_indices, axs):

    #Finding labels by comparing to test labels
    if test_labels[i] == 1:
        img_label = "Parasitized"
    else:
        img_label = "Uninfected"

    # Displaying the image in HSV format
    ax.imshow(test_images_hsv[i])
    ax.set_title("Index:" + str(i) + " " + img_label)
    ax.axis("off")

plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Observations and insights:

By converting to HSV, the color information is separated into distinct channels: hue, saturation, and value which facilitates color-based analysis, such as color segmentation or object detection. The above image are an example of how there is a clearer distinction now between uninfected cells and parastisized cells.

Processing Images using Gaussian Blurring

Gaussian Blurring on train data

In [ ]:
import cv2

# Apply Gaussian blurring to train images
train_images_blurred = []
for image in train_images:
    # Converting image to float32 datatype
    image = image.astype(np.float32)
    # Applying Gaussian blur with a kernel size of (5, 5) and sigma of 0
    blurred_image = cv2.GaussianBlur(image, (5, 5),0)
    train_images_blurred.append(blurred_image)

# Convert the list to a numpy array
train_images_blurred = np.array(train_images_blurred)
In [ ]:
#Visualizing the images after conversion

# Generating 3 random indices
random_indices = np.random.choice(len(train_images_blurred), size=3, replace=False)

# Creating 3 subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))

# Iterating over the subplots and randomly selected indices
for i, ax in zip(random_indices, axs):

    #Finding labels by comparing to train labels
    if train_labels[i] == 1:
        img_label = "Parasitized"
    else:
        img_label = "Uninfected"

    # Display the image in HSV format
    ax.imshow(train_images_blurred[i])
    ax.set_title("Index:" + str(i) + " " + img_label)
    ax.axis("off")

plt.show()

Gaussian Blurring on test data

In [ ]:
# Apply Gaussian blurring to train images
test_images_blurred = []
for image in train_images:
    # Converting image to float32 datatype
    image = image.astype(np.float32)
    # Applying Gaussian blur with a kernel size of (5, 5) and sigma of 0
    blurred_image = cv2.GaussianBlur(image, (5, 5),0)
    test_images_blurred.append(blurred_image)

# Convert the list to a numpy array
test_images_blurred = np.array(test_images_blurred)
In [ ]:
#Visualizing the images after conversion

# Generating 3 random indices
random_indices = np.random.choice(len(test_images_blurred), size=3, replace=False)

# Creating 3 subplots
fig, axs = plt.subplots(1, 3, figsize=(12, 4))

# Iterating over the subplots and randomly selected indices
for i, ax in zip(random_indices, axs):

    #Finding labels by comparing to train labels
    if train_labels[i] == 1:
        img_label = "Parasitized"
    else:
        img_label = "Uninfected"

    # Display the image in HSV format
    ax.imshow(test_images_blurred[i])
    ax.set_title("Index:" + str(i) + " " + img_label)
    ax.axis("off")

plt.show()

Observations and insights: *¶

The dataset doesn't have a lot of noise to begin with and so the gaussian blurring approach, in an attempt to remove noise from the images, may make it harder for the ML models to differentiate distintive characteristics between the two types of cells.

Instead, we can attempt data augmentation to improve the ML's ability to detect the differentiative characteristics regardless of orientation and therefore improve the robustness of the system.

We could also try transfer learning by using a model that has been pretrained on a larger set of cell images. This could improve classification accuracy

Model Building¶

Base Model¶

Importing the required libraries for building and training our Model

In [ ]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras import backend
import random
In [ ]:
#Clearing the backend session
backend.clear_session()
In [ ]:
# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)

One Hot Encoding the train and test labels

In [ ]:
train_labels = to_categorical(train_labels, 2)
test_labels = to_categorical(test_labels, 2)

Building the model

In [ ]:
#Define the model
model = Sequential()

#Add convolutional layers
model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu', input_shape=(64, 64, 3)))
model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu'))
model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

model.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu'))
model.add(MaxPooling2D(pool_size = 2))

model.add(Dropout(0.2))

#Flatten to convert 2D features to 1D vector for fully connect layers
model.add(Flatten())

#Add dense layers
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

# Print the model summary
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 64, 64, 32)        416       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 32, 32, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4128      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 16, 16, 32)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 32)        4128      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 8, 8, 32)          0         
                                                                 
 flatten (Flatten)           (None, 2048)              0         
                                                                 
 dense (Dense)               (None, 128)               262272    
                                                                 
 dropout_3 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 2)                 258       
                                                                 
=================================================================
Total params: 271,202
Trainable params: 271,202
Non-trainable params: 0
_________________________________________________________________

Compiling the model

In [ ]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Using Callbacks

In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]

Fit and train our Model

In [ ]:
history = model.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=callbacks, epochs=10)
Epoch 1/10
780/780 [==============================] - 134s 169ms/step - loss: 0.4578 - accuracy: 0.7667 - val_loss: 0.1766 - val_accuracy: 0.9381
Epoch 2/10
780/780 [==============================] - 130s 167ms/step - loss: 0.1469 - accuracy: 0.9519 - val_loss: 0.1597 - val_accuracy: 0.9431
Epoch 3/10
780/780 [==============================] - 129s 165ms/step - loss: 0.1208 - accuracy: 0.9651 - val_loss: 0.1172 - val_accuracy: 0.9665
Epoch 4/10
780/780 [==============================] - 139s 178ms/step - loss: 0.1033 - accuracy: 0.9703 - val_loss: 0.1096 - val_accuracy: 0.9669
Epoch 5/10
780/780 [==============================] - 134s 172ms/step - loss: 0.0915 - accuracy: 0.9731 - val_loss: 0.0855 - val_accuracy: 0.9777
Epoch 6/10
780/780 [==============================] - 131s 168ms/step - loss: 0.0867 - accuracy: 0.9741 - val_loss: 0.0778 - val_accuracy: 0.9812
Epoch 7/10
780/780 [==============================] - 139s 178ms/step - loss: 0.0770 - accuracy: 0.9763 - val_loss: 0.0660 - val_accuracy: 0.9842
Epoch 8/10
780/780 [==============================] - 133s 171ms/step - loss: 0.0736 - accuracy: 0.9764 - val_loss: 0.0598 - val_accuracy: 0.9812
Epoch 9/10
780/780 [==============================] - 128s 165ms/step - loss: 0.0720 - accuracy: 0.9766 - val_loss: 0.0663 - val_accuracy: 0.9804
Epoch 10/10
780/780 [==============================] - 133s 171ms/step - loss: 0.0680 - accuracy: 0.9784 - val_loss: 0.0581 - val_accuracy: 0.9754

Evaluating the model on test data

In [ ]:
test_accuracy = model.evaluate(test_images, test_labels)
print("Test Accuracy:", test_accuracy)
82/82 [==============================] - 5s 58ms/step - loss: 0.0581 - accuracy: 0.9754
Test Accuracy: [0.05805911868810654, 0.9753845930099487]

Plotting the confusion matrix

In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

#Predicting the classes for the test images
y_pred = model.predict(test_images)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true_classes = np.argmax(test_labels, axis=1)

# Printing the classification report will be useful too
print(classification_report(y_true_classes, y_pred_classes))
82/82 [==============================] - 3s 35ms/step
              precision    recall  f1-score   support

           0       0.96      0.99      0.98      1300
           1       0.99      0.96      0.97      1300

    accuracy                           0.98      2600
   macro avg       0.98      0.98      0.98      2600
weighted avg       0.98      0.98      0.98      2600

In [ ]:
cm = confusion_matrix(y_true_classes, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

Plotting the train and validation curves

In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

So now let's try to build another model with few more add on layers and try to check if we can try to improve the model. Let's build a model by adding few layers if required and altering the activation functions.

Model 1

Trying to improve the performance of our model by adding new layers

In [ ]:
#Clearing the backend session
backend.clear_session()

# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)

Building the Model

In [ ]:
# Define the model
model1 = Sequential()

# Add convolutional layers, changed activation functions to tanh
model1.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh", input_shape=(64, 64, 3)))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))

model1.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))

model1.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))

#New layers
model1.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model1.add(MaxPooling2D(pool_size=2))
model1.add(Dropout(0.2))

# Flatten to convert 2D features to 1D vector for fully connected layers
model1.add(Flatten())

# Add dense layer
model1.add(Dense(128, activation="tanh"))
model1.add(Dropout(0.5))

#Changed the output activation to sigmoid as this is a binary classification problem (infected or not infected)
#Labels for this model are not One hot encoded as maintaining the binary 1D array is necessary for an output layer of 1
model1.add(Dense(1, activation="sigmoid"))

# Print the model summary
model1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 64, 64, 32)        416       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 32, 32, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4128      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 16, 16, 32)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 32)        4128      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 8, 8, 32)          0         
                                                                 
 conv2d_3 (Conv2D)           (None, 8, 8, 32)          4128      
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 4, 4, 32)         0         
 2D)                                                             
                                                                 
 dropout_3 (Dropout)         (None, 4, 4, 32)          0         
                                                                 
 flatten (Flatten)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 128)               65664     
                                                                 
 dropout_4 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 78,593
Trainable params: 78,593
Non-trainable params: 0
_________________________________________________________________

Compiling the model

In [ ]:
model1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Using Callbacks

In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]

Fit and Train the model

In [ ]:
history1 = model1.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=callbacks, epochs=10)
Epoch 1/10
780/780 [==============================] - 148s 188ms/step - loss: 0.4278 - accuracy: 0.7840 - val_loss: 0.5854 - val_accuracy: 0.8058
Epoch 2/10
780/780 [==============================] - 141s 181ms/step - loss: 0.1592 - accuracy: 0.9437 - val_loss: 0.2362 - val_accuracy: 0.9077
Epoch 3/10
780/780 [==============================] - 141s 181ms/step - loss: 0.1259 - accuracy: 0.9576 - val_loss: 0.1576 - val_accuracy: 0.9473
Epoch 4/10
780/780 [==============================] - 141s 181ms/step - loss: 0.1092 - accuracy: 0.9636 - val_loss: 0.1092 - val_accuracy: 0.9573
Epoch 5/10
780/780 [==============================] - 139s 178ms/step - loss: 0.1002 - accuracy: 0.9664 - val_loss: 0.0842 - val_accuracy: 0.9746
Epoch 6/10
780/780 [==============================] - 141s 180ms/step - loss: 0.0957 - accuracy: 0.9687 - val_loss: 0.0851 - val_accuracy: 0.9738
Epoch 7/10
780/780 [==============================] - 146s 187ms/step - loss: 0.0924 - accuracy: 0.9707 - val_loss: 0.0749 - val_accuracy: 0.9773
Epoch 8/10
780/780 [==============================] - 139s 178ms/step - loss: 0.0868 - accuracy: 0.9715 - val_loss: 0.0676 - val_accuracy: 0.9823
Epoch 9/10
780/780 [==============================] - 141s 181ms/step - loss: 0.0892 - accuracy: 0.9717 - val_loss: 0.0625 - val_accuracy: 0.9792
Epoch 10/10
780/780 [==============================] - 141s 181ms/step - loss: 0.0833 - accuracy: 0.9733 - val_loss: 0.0585 - val_accuracy: 0.9835

Evaluating the model

In [ ]:
test_accuracy1 = model1.evaluate(test_images, test_labels)
print("Test Accuracy:", test_accuracy1)
82/82 [==============================] - 3s 39ms/step - loss: 0.0585 - accuracy: 0.9835
Test Accuracy: [0.05848436802625656, 0.9834615588188171]

Plotting the confusion matrix

In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#Predicting the classes for the test images
y_pred_classes = (model1.predict(test_images) > 0.5).astype(int)

# Printing the classification report will be useful too
print(classification_report(test_labels, y_pred_classes))
82/82 [==============================] - 5s 55ms/step
              precision    recall  f1-score   support

           0       0.99      0.98      0.98      1300
           1       0.98      0.99      0.98      1300

    accuracy                           0.98      2600
   macro avg       0.98      0.98      0.98      2600
weighted avg       0.98      0.98      0.98      2600

In [ ]:
cm = confusion_matrix(test_labels, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

Plotting the train and the validation curves

In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history1.history['accuracy'], label='Train Accuracy')
plt.plot(history1.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

The accuracy of the model has not improved. The computational cost of using the tanh activation is higher than the relu activation so its worth reverting back. Using a sigmoid function activation in the output layer has also not impacted the accuracy of the model but maintining this may be an appropriate decision given the problem statement and how it is a binary classification problem.

Let us try to build a model using BatchNormalization and using LeakyRelu as our activation function.

Model 2 with Batch Normalization

In [ ]:
#Clearing the backend session
backend.clear_session()

# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)

Building the Model

In [ ]:
from keras.layers import LeakyReLU, BatchNormalization

#Define the model
model2 = Sequential()

#Add convolutional layers, changed activation functions to LeakyRelu and added Batch Normalization layers
model2.add(Conv2D(32, kernel_size=2, padding='same', input_shape=(64, 64, 3)))
model2.add(LeakyReLU(alpha=0.01))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))

model2.add(Conv2D(32, kernel_size=2, padding='same'))
model2.add(LeakyReLU(alpha=0.01))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))

model2.add(Conv2D(32, kernel_size=2, padding='same'))
model2.add(LeakyReLU(alpha=0.01))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))

model2.add(Conv2D(32, kernel_size=2, padding='same'))
model2.add(LeakyReLU(alpha=0.01))
model2.add(BatchNormalization())
model2.add(MaxPooling2D((2, 2)))

#Flatten to convert 2D features to 1D vector for fully connected layers
model2.add(Flatten())

model2.add(Dense(128))
model2.add(LeakyReLU(alpha=0.01))
model2.add(BatchNormalization())
model2.add(Dropout(0.5))

#Changed the output activation to sigmoid as this is a binary classification problem (infected or not infected)
#Labels for this model are not One hot encoded as maintaining the binary 1D array is necessary for an output layer of 1
model2.add(Dense(1, activation='sigmoid'))

#Print the model summary
model2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 64, 64, 32)        416       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 64, 64, 32)        0         
                                                                 
 batch_normalization (BatchN  (None, 64, 64, 32)       128       
 ormalization)                                                   
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4128      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 32, 32, 32)        0         
                                                                 
 batch_normalization_1 (Batc  (None, 32, 32, 32)       128       
 hNormalization)                                                 
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 32)        4128      
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 16, 16, 32)        0         
                                                                 
 batch_normalization_2 (Batc  (None, 16, 16, 32)       128       
 hNormalization)                                                 
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 8, 8, 32)          4128      
                                                                 
 leaky_re_lu_3 (LeakyReLU)   (None, 8, 8, 32)          0         
                                                                 
 batch_normalization_3 (Batc  (None, 8, 8, 32)         128       
 hNormalization)                                                 
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 4, 4, 32)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 512)               0         
                                                                 
 dense (Dense)               (None, 128)               65664     
                                                                 
 leaky_re_lu_4 (LeakyReLU)   (None, 128)               0         
                                                                 
 batch_normalization_4 (Batc  (None, 128)              512       
 hNormalization)                                                 
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 79,617
Trainable params: 79,105
Non-trainable params: 512
_________________________________________________________________

Compiling the model

In [ ]:
model2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Using callbacks

In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]

Fit and train the model

In [ ]:
history2 = model2.fit(train_images, train_labels, validation_data=(test_images, test_labels), callbacks=callbacks, epochs=10)
Epoch 1/10
780/780 [==============================] - 196s 237ms/step - loss: 0.1825 - accuracy: 0.9286 - val_loss: 0.0782 - val_accuracy: 0.9712
Epoch 2/10
780/780 [==============================] - 183s 235ms/step - loss: 0.0864 - accuracy: 0.9715 - val_loss: 0.0668 - val_accuracy: 0.9808
Epoch 3/10
780/780 [==============================] - 181s 232ms/step - loss: 0.0784 - accuracy: 0.9730 - val_loss: 0.0513 - val_accuracy: 0.9862
Epoch 4/10
780/780 [==============================] - 184s 236ms/step - loss: 0.0682 - accuracy: 0.9770 - val_loss: 0.1631 - val_accuracy: 0.9600
Epoch 5/10
780/780 [==============================] - 182s 233ms/step - loss: 0.0663 - accuracy: 0.9778 - val_loss: 0.0881 - val_accuracy: 0.9762

Plotting the train and validation accuracy

In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history2.history['accuracy'], label='Train Accuracy')
plt.plot(history2.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

The model is learning and generalizing well initially but begins to overfit after the second epoch, as the validation loss began increasing after that. The early stopping callback stopped training after two epochs of increasing validation loss which helped prevent further overfitting.

Evaluating the model

In [ ]:
test_accuracy2 = model2.evaluate(test_images, test_labels)
print("Test Accuracy:", test_accuracy2)
82/82 [==============================] - 5s 59ms/step - loss: 0.0881 - accuracy: 0.9762
Test Accuracy: [0.05848436802625656, 0.9834615588188171]

Observations and insights: __

Despite the evaluation showing a high accuracy, the plot of the accuracies indicate a problem.

The model is learning and generalizing well initially but begins to overfit after the second epoch, as the validation loss began increasing after that. The early stopping callback stopped training after two epochs of increasing validation loss which helped prevent further overfitting.

Generate the classification report and confusion matrix

In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#Predicting the classes for the test images
y_pred_classes = (model2.predict(test_images) > 0.5).astype(int)

# Printing the classification report will be useful too
print(classification_report(test_labels, y_pred_classes))
82/82 [==============================] - 7s 86ms/step
              precision    recall  f1-score   support

           0       0.96      0.99      0.98      1300
           1       0.99      0.96      0.98      1300

    accuracy                           0.98      2600
   macro avg       0.98      0.98      0.98      2600
weighted avg       0.98      0.98      0.98      2600

In [ ]:
cm = confusion_matrix(test_labels, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

Data Augmentation can help with the overfitting problem by creating new training samples with rotations, and other transformations of the image dataset. This could make the ML model more robust.

Model 3 with Data Augmentation

In [ ]:
#Clearing the backend session
backend.clear_session()

# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)

Use image data generator

In [ ]:
#Only using image data generator on the train data set, test data is untouched so the the model can be evaluated against real data

from keras.preprocessing.image import ImageDataGenerator

#Define parameters
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True, zoom_range=0.5, rotation_range=45)

#prepare an iterators to scale images
train_iterator = datagen.flow(train_images, train_labels, batch_size=64, seed = 25, shuffle = True)

#Steps per epoch = no. of unique images in train data/batch size = 24958/64 (rounded up)
steps_per_epoch = 390

Visualizing Augmented images

In [ ]:
#Retrieving one batch of images
for X_batch, y_batch in train_iterator:
    # Create a grid of 3x3 images
    for i in range(0, 9):
        plt.subplot(330 + 1 + i)
        plt.imshow(X_batch[i].astype('float32'))
        #To display labels
        if np.argmax(y_batch[i]) == 0:
            plt.title("Uninfected")
        else:
            plt.title("Parasitized")
    # Show the plot
    plt.show()
    break

Observations and insights: __

The images are now in irregular shapes and orientations, creating a more diverse training set of data.

Building the Model

In [ ]:
#Reverting back to a model without Batch Normalization and Leaky Relu to avoid overfitting issue
#Reverting back to relu instead of tanh to reduce computational cost
#Maintaining sigmoid function in output layer
#Removing extra layers (using same amount of layers as base model) as there was no increase in accuracy from adding additional layers

#Define the model
model3 = Sequential()

#Add convolutional layers
model3.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu', input_shape=(64, 64, 3)))
model3.add(MaxPooling2D(pool_size = 2))
model3.add(Dropout(0.2))

model3.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu'))
model3.add(MaxPooling2D(pool_size = 2))
model3.add(Dropout(0.2))

model3.add(Conv2D(filters = 32, kernel_size = 2, padding = "same", activation='relu'))
model3.add(MaxPooling2D(pool_size = 2))
model3.add(Dropout(0.2))

#Flatten to convert 2D features to 1D vector for fully connect layers
model3.add(Flatten())

#Add dense layers
model3.add(Dense(128, activation='relu'))
model3.add(Dropout(0.5))

#Labels for this model are not One hot encoded as maintaining the binary 1D array is necessary for an output layer of 1
model3.add(Dense(1, activation='sigmoid'))

# Print the model summary
model3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 64, 64, 32)        416       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 32, 32, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4128      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 16, 16, 32)        0         
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 32)        4128      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 8, 8, 32)          0         
                                                                 
 flatten (Flatten)           (None, 2048)              0         
                                                                 
 dense (Dense)               (None, 128)               262272    
                                                                 
 dropout_3 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 271,073
Trainable params: 271,073
Non-trainable params: 0
_________________________________________________________________
In [ ]:
model3.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Using Callbacks

In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]

Fit and Train the model

In [ ]:
history3 = model3.fit(train_iterator, steps_per_epoch=steps_per_epoch, validation_data=(test_images, test_labels), callbacks=callbacks, epochs=10)
Epoch 1/10
390/390 [==============================] - 163s 414ms/step - loss: 0.5752 - accuracy: 0.6919 - val_loss: 0.2370 - val_accuracy: 0.9258
Epoch 2/10
390/390 [==============================] - 162s 416ms/step - loss: 0.2510 - accuracy: 0.9035 - val_loss: 0.1815 - val_accuracy: 0.9381
Epoch 3/10
390/390 [==============================] - 162s 415ms/step - loss: 0.2154 - accuracy: 0.9240 - val_loss: 0.1741 - val_accuracy: 0.9385
Epoch 4/10
390/390 [==============================] - 160s 409ms/step - loss: 0.1918 - accuracy: 0.9322 - val_loss: 0.1333 - val_accuracy: 0.9596
Epoch 5/10
390/390 [==============================] - 162s 416ms/step - loss: 0.1954 - accuracy: 0.9348 - val_loss: 0.1148 - val_accuracy: 0.9731
Epoch 6/10
390/390 [==============================] - 162s 415ms/step - loss: 0.1825 - accuracy: 0.9401 - val_loss: 0.1170 - val_accuracy: 0.9742
Epoch 7/10
390/390 [==============================] - 159s 407ms/step - loss: 0.1779 - accuracy: 0.9409 - val_loss: 0.1122 - val_accuracy: 0.9638
Epoch 8/10
390/390 [==============================] - 162s 415ms/step - loss: 0.1733 - accuracy: 0.9422 - val_loss: 0.0942 - val_accuracy: 0.9792
Epoch 9/10
390/390 [==============================] - 161s 413ms/step - loss: 0.1671 - accuracy: 0.9441 - val_loss: 0.0862 - val_accuracy: 0.9788
Epoch 10/10
390/390 [==============================] - 163s 418ms/step - loss: 0.1650 - accuracy: 0.9442 - val_loss: 0.0899 - val_accuracy: 0.9696

Evaluating the model

Plot the train and validation accuracy

In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history3.history['accuracy'], label='Train Accuracy')
plt.plot(history3.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

A higher validation accuracy is a good thing in this scenario as the train set is inclusive of irregular images that have been augmented. If the model is able to maintain a higher accuracy on the untampered validation set, it means that the model's training on augmented data has made it more robust when looking at real images.

Plotting the classification report and confusion matrix

In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#Predicting the classes for the test images
y_pred_classes = (model3.predict(test_images) > 0.5).astype(int)

# Printing the classification report will be useful too
print(classification_report(test_labels, y_pred_classes))
82/82 [==============================] - 3s 38ms/step
              precision    recall  f1-score   support

           0       0.95      0.99      0.97      1300
           1       0.99      0.95      0.97      1300

    accuracy                           0.97      2600
   macro avg       0.97      0.97      0.97      2600
weighted avg       0.97      0.97      0.97      2600

In [ ]:
cm = confusion_matrix(test_labels, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

Now, let us try to use a pretrained model like VGG16 and check how it performs on our data.

Pre-trained model (VGG16)¶

  • Importing VGG16 network upto any layer I choose
  • Added Fully Connected Layers on top of it
In [ ]:
#Clearing the backend session
backend.clear_session()

# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)

Building the model

In [ ]:
from keras.applications import VGG16

# Load the VGG model
vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(64, 64, 3))


vgg_conv.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 64, 64, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 64, 64, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 64, 64, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 32, 32, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 32, 32, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 32, 32, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 16, 16, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 16, 16, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 16, 16, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 16, 16, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 8, 8, 256)         0         
                                                                 
 block4_conv1 (Conv2D)       (None, 8, 8, 512)         1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 8, 8, 512)         2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 8, 8, 512)         2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 4, 4, 512)         0         
                                                                 
 block5_conv1 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 2, 2, 512)         0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
In [ ]:
#Lets choose upto block3_conv1, as further layers will take longer to fit due to significantly higher number of parameters
from tensorflow.keras.models import Model

output_layer = vgg_conv.get_layer('block3_conv1').output

#We dont want to train the pre-existing layers
vgg_conv.trainable = False

#Initializing out model to have the vgg upto the block3_pool layer
model4 = Model(inputs=vgg_conv.input, outputs=output_layer)

#Adding layers to our own model
model4 = Sequential([
    model4,
    Flatten(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

Compiling the model

In [ ]:
model4.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

using callbacks

In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]

Fit and Train the model

In [ ]:
history4 = model4.fit(train_images, train_labels, validation_data=(test_images, test_labels), batch_size = 64, callbacks=callbacks, epochs=10)
Epoch 1/10
390/390 [==============================] - 778s 2s/step - loss: 8.0170 - accuracy: 0.7205 - val_loss: 0.4966 - val_accuracy: 0.8565
Epoch 2/10
390/390 [==============================] - 770s 2s/step - loss: 0.5482 - accuracy: 0.6939 - val_loss: 0.3484 - val_accuracy: 0.8958
Epoch 3/10
390/390 [==============================] - 759s 2s/step - loss: 0.4776 - accuracy: 0.7108 - val_loss: 0.4345 - val_accuracy: 0.9092
Epoch 4/10
390/390 [==============================] - 758s 2s/step - loss: 0.5369 - accuracy: 0.7106 - val_loss: 0.3711 - val_accuracy: 0.9158

The 'val_loss' at the 2nd epoch is 0.3484, but it increases to 0.4345 at the 3rd epoch and further to 0.3711 at the 4th epoch. Since the 'val_loss' does not improve for 2 consecutive epochs ('patience'=2), the EarlyStopping callback stops the training after the 4th epoch.

Plot the train and validation accuracy

In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history4.history['accuracy'], label='Train Accuracy')
plt.plot(history4.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Observations and insights: _¶

High validation accuracy indicates that the model performs well on unseen data however the difference in the training accuracy indicates that there is a case of underfitting. The model is not learning effectively from the training data.

Evaluating the model

In [ ]:
test_accuracy4 = model4.evaluate(test_images, test_labels)
print("Test Accuracy:", test_accuracy4)
82/82 [==============================] - 57s 694ms/step - loss: 0.3711 - accuracy: 0.9158
Test Accuracy: [0.3710765540599823, 0.9157692193984985]

Plotting the classification report and confusion matrix

In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#Predicting the classes for the test images
y_pred_classes = (model4.predict(test_images) > 0.5).astype(int)

# Printing the classification report will be useful too
print(classification_report(test_labels, y_pred_classes))
82/82 [==============================] - 56s 684ms/step
              precision    recall  f1-score   support

           0       0.93      0.90      0.91      1300
           1       0.91      0.93      0.92      1300

    accuracy                           0.92      2600
   macro avg       0.92      0.92      0.92      2600
weighted avg       0.92      0.92      0.92      2600

In [ ]:
cm = confusion_matrix(test_labels, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

The accuracy of the VGG pretrained model has dropped compared to other models. This could be due to not using more layers, however it is indicative that the pretrained image data is not related to the image dataset that we are training. There are a higher amount of false positives and false negatives in this model indicating that it is making more mistakes in classification of the cell images compared to other models.

In [ ]:
#Chose the model with the best accuracy scores from all the above models and saved it as a final model.
final_model = history1

Observations and Conclusions drawn from the final model:

Model 1 (the one after the base model) has had the best performance compared to any other models created with an accuracy of 98%. The base model also had an accuracy of 98% however, when comparing the confusion matrix of both models, the base model has a higher number of false negatives(52) than model 1 (12). This is important in this scenario as a false negative of malaria detection can lead to a patient with malaria being misdiagnosed and not receiving the treatment required and so a minimal false negative rate is highly desirable.

Improvements that can be done:

It is likely that a pre-trained model that is specialised in cell images may result in a higher accuracy of this dataset e.g. the Keras R-CNN model that has been specifically trained to identify and classify a large number of cells.

Let's try the model1 architecture using the HSV images to see if there is a noticeable difference in performance.

HSV Image Model

In [ ]:
#Clearing the backend session
backend.clear_session()

# Setting the random seed to standardize outputs
np.random.seed(25)
random.seed(25)
tf.random.set_seed(25)
In [ ]:
#Define the model/ rename model1 to model_hsv
model_hsv = Sequential()

# Add convolutional layers, changed activation functions to tanh
model_hsv.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh", input_shape=(64, 64, 3)))
model_hsv.add(MaxPooling2D(pool_size=2))
model_hsv.add(Dropout(0.2))

model_hsv.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model_hsv.add(MaxPooling2D(pool_size=2))
model_hsv.add(Dropout(0.2))

model_hsv.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model_hsv.add(MaxPooling2D(pool_size=2))
model_hsv.add(Dropout(0.2))

#New layers
model_hsv.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="tanh"))
model_hsv.add(MaxPooling2D(pool_size=2))
model_hsv.add(Dropout(0.2))

# Flatten to convert 2D features to 1D vector for fully connected layers
model_hsv.add(Flatten())

# Add dense layer
model_hsv.add(Dense(128, activation="tanh"))
model_hsv.add(Dropout(0.5))

#Changed the output activation to sigmoid as this is a binary classification problem (infected or not infected)
#Labels for this model are not One hot encoded as maintaining the binary 1D array is necessary for an output layer of 1
model_hsv.add(Dense(1, activation="sigmoid"))

# Print the model summary
model_hsv.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_4 (Conv2D)           (None, 64, 64, 32)        416       
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 32, 32, 32)       0         
 2D)                                                             
                                                                 
 dropout_5 (Dropout)         (None, 32, 32, 32)        0         
                                                                 
 conv2d_5 (Conv2D)           (None, 32, 32, 32)        4128      
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 dropout_6 (Dropout)         (None, 16, 16, 32)        0         
                                                                 
 conv2d_6 (Conv2D)           (None, 16, 16, 32)        4128      
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 8, 8, 32)         0         
 2D)                                                             
                                                                 
 dropout_7 (Dropout)         (None, 8, 8, 32)          0         
                                                                 
 conv2d_7 (Conv2D)           (None, 8, 8, 32)          4128      
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 4, 4, 32)         0         
 2D)                                                             
                                                                 
 dropout_8 (Dropout)         (None, 4, 4, 32)          0         
                                                                 
 flatten_1 (Flatten)         (None, 512)               0         
                                                                 
 dense_1 (Dense)             (None, 128)               65664     
                                                                 
 dropout_9 (Dropout)         (None, 128)               0         
                                                                 
 dense_2 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 78,593
Trainable params: 78,593
Non-trainable params: 0
_________________________________________________________________
In [ ]:
model_hsv.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
In [ ]:
callbacks = [
    EarlyStopping(monitor='val_loss', patience=2),
    ModelCheckpoint(filepath='.mdl_wts.hdf5', monitor='val_loss', save_best_only=True)
]
In [ ]:
history_hsv = model_hsv.fit(train_images_hsv, train_labels, validation_data=(test_images_hsv, test_labels), callbacks=callbacks, epochs=10)
Epoch 1/10
780/780 [==============================] - 167s 210ms/step - loss: 0.7204 - accuracy: 0.5201 - val_loss: 0.6854 - val_accuracy: 0.5450
Epoch 2/10
780/780 [==============================] - 190s 244ms/step - loss: 0.6871 - accuracy: 0.5513 - val_loss: 0.6837 - val_accuracy: 0.5554
Epoch 3/10
780/780 [==============================] - 175s 224ms/step - loss: 0.6752 - accuracy: 0.5742 - val_loss: 0.7069 - val_accuracy: 0.5046
Epoch 4/10
780/780 [==============================] - 144s 184ms/step - loss: 0.6714 - accuracy: 0.5807 - val_loss: 0.6920 - val_accuracy: 0.5362
In [ ]:
test_accuracy_hsv = model_hsv.evaluate(test_images_hsv, test_labels)
print("Test Accuracy:", test_accuracy_hsv)
82/82 [==============================] - 3s 38ms/step - loss: 0.6920 - accuracy: 0.5362
Test Accuracy: [0.6920151710510254, 0.5361538529396057]
In [ ]:
#Plotting the train and validation accuracy
plt.figure(figsize=(8, 6))
plt.plot(history_hsv.history['accuracy'], label='Train Accuracy')
plt.plot(history_hsv.history['val_accuracy'], label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
In [ ]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#Predicting the classes for the test images
y_pred_classes = (model_hsv.predict(test_images_hsv) > 0.5).astype(int)

# Printing the classification report will be useful too
print(classification_report(test_labels, y_pred_classes))
82/82 [==============================] - 3s 37ms/step
              precision    recall  f1-score   support

           0       0.63      0.18      0.28      1300
           1       0.52      0.89      0.66      1300

    accuracy                           0.54      2600
   macro avg       0.57      0.54      0.47      2600
weighted avg       0.57      0.54      0.47      2600

In [ ]:
cm = confusion_matrix(test_labels, y_pred_classes)
#Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

The model using HSV images has performed much worse, with an accuracy of 54%. This indicates that using HSV images may not be the appropriate approach for training models to detect infected cells.

Insights¶

Refined insights:¶

  1. This is a binary classification problem looking at 2 categories - infected and non infected cells.
  2. The training data provided a balanced dataset of an equal amount of images of infected and non infected cells.
  3. From visualizations of the cells, it is clear that the most distinctive feature highlighting the difference of the cells is the discolouration of the infected cells which have darker pinks/purples within the cell walls.
  4. Models performed best when using the original dataset. Altering the images to assist with detection (HSV Images/ Data Augmentation) led to worse predictive accuracies with the models.

Comparison of various techniques and their relative performance:¶

Model1 (using tanh activation and sigmoid function in the output layer) performed best with an accuracy of 98%. The pretrained model performed the worst with an accuracy of 92% however this could have been due to not implementing all layers as with more parameters and more epochs, there could have been an increase in predictive accuracy. At 98% accuracy, scope for improvement is minor but possibly achievable by using transfer learning from a pretrained model that is specialised to predict cell images.

Proposal for the final solution design:¶

I propose model1 to be adopted as it has the highest predictive accuracy and the fewest false negatives in the confusion matrix. This means that this model was able to classify 98% percent of the data accurately and within the 2% that it wasnt, the instances of not detecting malaria when in fact there is malaria is kept to a minimum (12 instances of false negatives) which ensures that patients who have malaria are very rarely misdiagnosed and hence can receive the medical attention they require.